Co-clustering Documents and Words by Minimizing the Normalized Cut Objective Function
نویسنده
چکیده
This paper follows a word-document co-clustering model independently introduced in 2001 by several authors such as I.S. Dhillon, H. Zha and C. Ding. This model consists in creating a bipartite graph based on word frequencies in documents, and whose vertices are both documents and words. The created bipartite graph is then partitioned in a way that minimizes the normalized cut objective function to produce the document clustering. The fusion-fission graph partitioning metaheuristic is applied on several document collections using this word-document co-clustering model. Results demonstrate a real problem in this model: partitions found almost always have a normalized cut value lowest than the original document collection clustering. Moreover, measures of the goodness of solutions seem to be relatively independent of the normalized cut values of partitions.
منابع مشابه
یک مدل موضوعی احتمالاتی مبتنی بر روابط محلّی واژگان در پنجرههای همپوشان
A probabilistic topic model assumes that documents are generated through a process involving topics and then tries to reverse this process, given the documents and extract topics. A topic is usually assumed to be a distribution over words. LDA is one of the first and most popular topic models introduced so far. In the document generation process assumed by LDA, each document is a distribution o...
متن کاملA Hybrid Time Series Clustering Method Based on Fuzzy C-Means Algorithm: An Agreement Based Clustering Approach
In recent years, the advancement of information gathering technologies such as GPS and GSM networks have led to huge complex datasets such as time series and trajectories. As a result it is essential to use appropriate methods to analyze the produced large raw datasets. Extracting useful information from large data sets has always been one of the most important challenges in different sciences,...
متن کاملCoDiNMF: Co-clustering of Directed Graphs via NMF
Co-clustering computes clusters of data items and the related features concurrently, and it has been used in many applications such as community detection, product recommendation, computer vision, and pricing optimization. In this paper, we propose a new co-clustering method, called CoDiNMF, which improves the clustering quality and finds directional patterns among co-clusters by using multiple...
متن کاملA comparative performance of gray level image thresholding using normalized graph cut based standard S membership function
In this research paper, we use a normalized graph cut measure as a thresholding principle to separate an object from the background based on the standard S membership function. The implementation of the proposed algorithm known as fuzzy normalized graph cut method. This proposed algorithm compared with the fuzzy entropy method [25], Kittler [11], Rosin [21], Sauvola [23] and Wolf [33] method. M...
متن کاملRegularized Co-Clustering on Manifold
Co-clustering is to partition rows and columns of a matrix simultaneously. It has been an important research field in data mining and machine learning. It is preferred over traditional homogeneous clustering techniques in many real applications. In this paper, we present a co-clustering algorithm based on local information and regularization. The algorithm seeks to preserve the local intrinsic ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- J. Math. Model. Algorithms
دوره 9 شماره
صفحات -
تاریخ انتشار 2010